Goto

Collaborating Authors

 bayesian averaging


Bayesian Averaging is Well-Temperated

Neural Information Processing Systems

Bayesian predictions are stochastic just like predictions of any other inference scheme that generalize from a finite sample. While a sim(cid:173) ple variational argument shows that Bayes averaging is generaliza(cid:173) tion optimal given that the prior matches the teacher parameter distribution the situation is less clear if the teacher distribution is unknown. I define a class of averaging procedures, the temperated likelihoods, including both Bayes averaging with a uniform prior and maximum likelihood estimation as special cases. I show that Bayes is generalization optimal in this family for any teacher dis(cid:173) tribution for two learning problems that are analytically tractable: learning the mean of a Gaussian and asymptotics of smooth learn(cid:173) ers.


Bayesian Averaging is Well-Temperated

Neural Information Processing Systems

Often a learning problem has natural quantitative measure of generalization. If a loss function is defined the natural measure is the generalization error, i.e., the expected loss on a random sample independent of the training set. Generalizability is a key topic of learning theory and much progress has been reported. Analytic results for a broad class of machines can be found in the litterature [8, 12, 9, 10] describing the asymptotic generalization ability of supervised algorithms that are continuously parameterized. Asymptotic bounds on generalization for general machines have been advocated by Vapnik [11]. Generalization results valid for finite training sets can only be obtained for specific learning machines, see e.g.


Bayesian Averaging is Well-Temperated

Neural Information Processing Systems

Often a learning problem has natural quantitative measure of generalization. If a loss function is defined the natural measure is the generalization error, i.e., the expected loss on a random sample independent of the training set. Generalizability is a key topic of learning theory and much progress has been reported. Analytic results for a broad class of machines can be found in the litterature [8, 12, 9, 10] describing the asymptotic generalization ability of supervised algorithms that are continuously parameterized. Asymptotic bounds on generalization for general machines have been advocated by Vapnik [11]. Generalization results valid for finite training sets can only be obtained for specific learning machines, see e.g.


Bayesian Averaging is Well-Temperated

Neural Information Processing Systems

Often a learning problem has natural quantitative measure of generalization. If a loss function is defined the natural measure is the generalization error, i.e., the expected loss on a random sample independent of the training set. Generalizability is a key topic of learning theory and much progress has been reported. Analytic results for a broad class of machines can be found in the litterature [8, 12, 9, 10] describing the asymptotic generalization ability of supervised algorithms that are continuously parameterized. Asymptotic bounds on generalization for general machines havebeen advocated by Vapnik [11]. Generalization results valid for finite training sets can only be obtained for specific learning machines, see e.g.